feat(examples): add custom HTTP embedding example for LM Studio / Ollama#149
feat(examples): add custom HTTP embedding example for LM Studio / Ollama#149cluster2600 wants to merge 7 commits intoalibaba:mainfrom
Conversation
|
Thank you for your submission! This service-oriented model for invocation, which helps zvec achieve the RAG capability, is what we currently lack. https://zvec.org/api-reference/python/extension/#zvec.extension.DenseEmbeddingFunction |
|
Thanks for the feedback! Moved the implementation into |
Move the HTTP embedding implementation from the example script into python/zvec/extension/ as HTTPDenseEmbedding, inheriting from DenseEmbeddingFunction. The example now imports from zvec.extension instead of defining the class inline. Signed-off-by: Maxime <maxime@cluster2600.com> Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
31f67ed to
9a81b28
Compare
Move zvec imports to top-level, add noqa for print statements, replace os.path.exists with pathlib, fix import sorting. Signed-off-by: Maxime <maxime@cluster2600.com> Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
Signed-off-by: Maxime <maxime@cluster2600.com> Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
The vector_column_indexer_test failure is a known flaky assertion in hnsw_streamer_entity.h, unrelated to Python-only changes in this PR. Signed-off-by: Maxime <maxime@cluster2600.com> Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
|
@greptile |
Greptile SummaryThis PR adds Key additions:
The code is production-ready, well-documented, and provides valuable functionality for users running local inference servers. Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant HTTPDenseEmbedding
participant Cache
participant Server as Local Server<br/>(LM Studio/Ollama)
User->>HTTPDenseEmbedding: __init__(base_url, model)
HTTPDenseEmbedding->>HTTPDenseEmbedding: Store config
User->>HTTPDenseEmbedding: dimension property
HTTPDenseEmbedding->>HTTPDenseEmbedding: embed("dimension probe")
HTTPDenseEmbedding->>Server: POST /v1/embeddings
Server-->>HTTPDenseEmbedding: {data: [{embedding: [...]}]}
HTTPDenseEmbedding->>Cache: Store result
HTTPDenseEmbedding-->>User: vector dimension
User->>HTTPDenseEmbedding: embed("user text")
HTTPDenseEmbedding->>Cache: Check cache
alt Cache hit
Cache-->>HTTPDenseEmbedding: Cached vector
else Cache miss
HTTPDenseEmbedding->>Server: POST /v1/embeddings
Server-->>HTTPDenseEmbedding: {data: [{embedding: [...]}]}
HTTPDenseEmbedding->>Cache: Store result
end
HTTPDenseEmbedding-->>User: vector
Last reviewed commit: eb3960e |
Per maintainer feedback, examples requiring an external LLM server belong in the zvec-web project rather than in this repository. Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
|
Removed the example file as requested — server-dependent examples belong in zvec-web. |
Summary
This PR adds a self-contained example showing how to use any OpenAI-compatible HTTP embedding endpoint (LM Studio, Ollama, vLLM, LocalAI, …) as the embedding source in zvec.
What's added
examples/custom_http_embedding.pyHTTPEmbeddingFunction/v1/embeddingsendpoint, caches results with@lru_cache, and satisfies theDenseEmbeddingFunctionprotocol.--base-url,--model,--api-key,--collection-pathflags for easy customisation.Usage
Motivation
The existing extensions (
OpenAIDenseEmbedding, etc.) require theopenaipackage and are primarily designed for cloud APIs. Many developers want to use local inference servers without extra dependencies. This example shows the pattern using only Python stdlib, making it easy to adapt or inline.Testing
The example runs end-to-end against a live LM Studio instance on
localhost:1234. No new test infrastructure is required for a standalone script.